{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "afc9e675-3870-4174-8419-bab2534d852f",
   "metadata": {},
   "source": [
    "# ch2-使用大语言模型"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83ee014c-f070-474e-91b4-043fee28b2b3",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## 1.本节实战从4个方面演示使用大语言模型（LLM）的能力"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "add5f30a-3f60-4c44-b446-76b437e10ed8",
   "metadata": {
    "tags": []
   },
   "source": [
    "- <font size=3>阿里开源 `modelscope` 调用本地开源模型 `qwen2`</font>\n",
    "- <font size=3>huggingface `transformers` 调用本地开源模型 `chatglm3`</font>\n",
    "- <font size=3>通过Http API在线调用讯飞星火spark大模型</font>\n",
    "- <font size=3>通过Http API 调用ollama部署`qwen2`模型</font>\n",
    "\n",
    "### 还能学习\n",
    "- <font size=3>查看GPU</font>\n",
    "- <font size=3>流式输出</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74805eb2-da9c-4040-8577-29fdd386ced7",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## 2.查看GPU"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "998dc0fc-c017-4707-a89d-7faae0507ce2",
   "metadata": {
    "tags": []
   },
   "source": [
    "- <font size=3> `nvidia-smi` 查看机器上的nvidia显卡 </font>\n",
    "   - 驱动\n",
    "   - cuda\n",
    "   - 显卡型号\n",
    "   - 显存\n",
    "   - 利用率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "aa9653ff-1d3e-4327-8842-aa199d6a1e00",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fri Oct  4 16:04:01 2024       \n",
      "+-----------------------------------------------------------------------------+\n",
      "| NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |\n",
      "|-------------------------------+----------------------+----------------------+\n",
      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
      "|                               |                      |               MIG M. |\n",
      "|===============================+======================+======================|\n",
      "|   0  TITAN RTX           Off  | 00000000:04:00.0 Off |                  N/A |\n",
      "| 40%   28C    P8     1W / 280W |  13896MiB / 24220MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   1  TITAN RTX           Off  | 00000000:08:00.0 Off |                  N/A |\n",
      "| 41%   26C    P8    15W / 280W |  11586MiB / 24220MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   2  TITAN RTX           Off  | 00000000:0C:00.0 Off |                  N/A |\n",
      "| 41%   25C    P8    12W / 280W |  23315MiB / 24220MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   3  TITAN RTX           Off  | 00000000:0F:00.0 Off |                  N/A |\n",
      "| 41%   26C    P8    14W / 280W |  14865MiB / 24220MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "                                                                               \n",
      "+-----------------------------------------------------------------------------+\n",
      "| Processes:                                                                  |\n",
      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
      "|        ID   ID                                                   Usage      |\n",
      "|=============================================================================|\n",
      "|    0   N/A  N/A      4784      C   ...a_v11/ollama_llama_server    11161MiB |\n",
      "|    0   N/A  N/A      6508      C   ...onda3/envs/llm/bin/python     2025MiB |\n",
      "|    0   N/A  N/A     36275      C   ...on-webui/venv/bin/python3      707MiB |\n",
      "|    1   N/A  N/A      4784      C   ...a_v11/ollama_llama_server    11583MiB |\n",
      "|    2   N/A  N/A      4784      C   ...a_v11/ollama_llama_server    11161MiB |\n",
      "|    2   N/A  N/A     50936      C   ...onda3/envs/llm/bin/python    12151MiB |\n",
      "|    3   N/A  N/A      4784      C   ...a_v11/ollama_llama_server    11647MiB |\n",
      "|    3   N/A  N/A     36275      C   ...on-webui/venv/bin/python3     3215MiB |\n",
      "+-----------------------------------------------------------------------------+\n"
     ]
    }
   ],
   "source": [
    "! nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0b42cfd-7c00-41ad-8769-9296b7e4da27",
   "metadata": {},
   "source": [
    "- 设置程序可见的显卡\n",
    "   - `os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'` 只使用第0和第1个显卡 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8546c5f0-6f94-4db9-b23d-3cdfa3dda835",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ['CUDA_VISIBLE_DEVICES'] = '2'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b230969b-4542-41b8-bf50-9275fc5521b6",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## 3. 控制大语言模型的输出的随机性的参数"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d2c021b-3ce3-41b7-b770-e9ae1956bf5a",
   "metadata": {},
   "source": [
    "大语言模型预测下一个token时会先输出所有token的概率值，有不同的方法来控制选择哪一个token作为输出，主要以下4个参数\n",
    "\n",
    "- 温度（Temperature）: 起到平滑调整概率的作用，temperature=1时，原始概率保持不变，temperature<1时，原来概率大的会变得更大（概率集中效果），temperature>1时,概率值越平均\n",
    "\n",
    "- Top-K: 模型输出是在概率在top-k的范围里随机选择一个，K值越大，选择范围越广，生成的文本越多样；K值越小，选择范围越窄，生成的文本越趋向于高概率的词。 k=1就是直接选择最高概率的token输出\n",
    "\n",
    "- Top-p: 通过累积概率来限定范围，top-p=0.5表示随机采样的范围是概率在前50%的tokens， top-p选择的tokens数是动态的\n",
    "\n",
    "- max-tokens: max-tokens参数指定了模型在停止生成之前可以生成的最大token（或词）数量\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8455479e-2ed6-47f8-b9d9-9b94abfe6300",
   "metadata": {},
   "source": [
    "### $$\n",
    "[\n",
    "\\text{Softmax}_T(z)_i = \\frac{e^{\\frac{z_i}{T}}}{\\sum_{j} e^{\\frac{z_j}{T}}}\n",
    "]\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "57f63161-36d8-401d-8e00-125991627d79",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn.functional as F\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7e383f98-4489-4b8f-bb9a-50e07eebd9f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs = np.array([[2.0, 1.0, 0.1]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "077cf98e-4593-4ffd-8902-d0fa1fc7a5e9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "temperature = 1 [[0.65900114 0.24243297 0.09856589]]\n"
     ]
    }
   ],
   "source": [
    "temperature = 1\n",
    "logits = torch.tensor(inputs / temperature) \n",
    "softmax_scores = F.softmax(logits, dim=1)\n",
    "print(f\"temperature = {temperature} {softmax_scores.cpu().numpy()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0859aee4-4e12-425c-89ed-a8847bf8c495",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "temperature = 0.1 [[9.99954597e-01 4.53978684e-05 5.60254205e-09]]\n"
     ]
    }
   ],
   "source": [
    "temperature = 0.1\n",
    "logits = torch.tensor(inputs / temperature) \n",
    "softmax_scores = F.softmax(logits, dim=1)\n",
    "print(f\"temperature = {temperature} {softmax_scores.cpu().numpy()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8995456c-5417-4529-b3d5-d5387b372b79",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "temperature = 10 [[0.36605947 0.33122431 0.30271622]]\n"
     ]
    }
   ],
   "source": [
    "temperature = 10\n",
    "logits = torch.tensor(inputs / temperature) \n",
    "softmax_scores = F.softmax(logits, dim=1)\n",
    "print(f\"temperature = {temperature} {softmax_scores.cpu().numpy()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11a3681c-2226-46ee-9f84-908355afa821",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## 4. `modelscope` 调用本地开源模型 `qwen2`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed09e54d-ca53-4444-9e4f-96625e3e2227",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Qwen2是由阿里云通义千问团队研发的新一代大型语言模型系列\n",
    "\n",
    "Qwen2系列提供了多个不同规模的模型，以满足不同场景和计算资源的需求，具体包括：\n",
    "\n",
    "* Qwen2-0.5B\n",
    "* Qwen2-1.5B\n",
    "* Qwen2-7B\n",
    "* Qwen2-57B-A14B（混合专家模型，MoE）\n",
    "* Qwen2-72B\n",
    "\n",
    "这些模型在参数数量上从数亿到数百亿不等，为用户提供了丰富的选择。\n",
    "\n",
    "\n",
    "* 相比前代模型Qwen1.5，Qwen2在代码、数学、推理、指令遵循、多语言理解等多个方面实现了性能的显著提升。\n",
    "* 特别是在超长上下文处理方面，Qwen2-72B-Instruct模型支持处理长达**128K tokens**的上下文，这在大型文档理解和复杂对话处理中尤为重要。\n",
    "\n",
    "在原有的中文和英文基础上，Qwen2新增了27种语言的高质量数据，使得模型在多语言处理上更加出色。\n",
    "\n",
    "Qwen2模型在**ModelScope**和**Hugging Face**平台上可以在线体验\n",
    "\n",
    "\n",
    "#### 下载\n",
    "- https://www.modelscope.cn/models/qwen/Qwen2-7B-Instruct/files\n",
    "\n",
    "``` shell\n",
    "git clone https://www.modelscope.cn/qwen/Qwen2-7B-Instruct.git\n",
    "```\n",
    "#### 安装\n",
    "\n",
    "``` shell\n",
    "# modelscope==1.16.1\n",
    "pip install modelscope\n",
    "\n",
    "# pip install optimum\n",
    "# pip install auto-gptq\n",
    "# import autogptq_cuda_64\n",
    "\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5085c28-9b6a-410e-a1b5-a95b4f695468",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 调用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "799781f7-041b-46cb-83f9-2945b122b809",
   "metadata": {},
   "outputs": [],
   "source": [
    "from modelscope import AutoModelForCausalLM, AutoTokenizer, GenerationConfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "2fd81ff6-651b-46ca-82f6-9a3f205cc59a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/root/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/modeling.py:1365: UserWarning: Current model requires 704648448 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.\n",
      "  warnings.warn(\n",
      "Loading checkpoint shards: 100%|██████████| 4/4 [04:49<00:00, 72.36s/it]\n",
      "WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.\n",
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    }
   ],
   "source": [
    "model_path = './data/llm_app/llm/Qwen2-7B-Instruct/'\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(model_path,\n",
    "                                            device_map=\"auto\")\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path)\n",
    "gen_config = GenerationConfig.from_pretrained(model_path)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "26175902-c04b-4457-b26b-79b0be515dbb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'input_ids': [108386, 11, 525, 498, 5394], 'attention_mask': [1, 1, 1, 1, 1]}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer(\"你好, are you ok\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3929a29b-87c7-46da-89bc-6397b2be5c44",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'你好'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.decode(108386)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e9e4633f-c147-483f-8cd9-27651295d4ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "','"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokenizer.decode(11)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "52c4e01e-e7fc-4921-8ba8-74b0fc1c49c5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GenerationConfig {\n",
       "  \"bos_token_id\": 151643,\n",
       "  \"do_sample\": true,\n",
       "  \"eos_token_id\": [\n",
       "    151645,\n",
       "    151643\n",
       "  ],\n",
       "  \"pad_token_id\": 151643,\n",
       "  \"repetition_penalty\": 1.05,\n",
       "  \"temperature\": 0.7,\n",
       "  \"top_k\": 20,\n",
       "  \"top_p\": 0.8\n",
       "}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gen_config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "5bc37167-23e6-403b-8867-254bb25c3240",
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_prompt(prompt, temperature=0.1, top_k=20, top_p=0.8, max_new_tokens=2048):\n",
    "    gen_config.temperature = temperature\n",
    "    gen_config.top_k = top_k\n",
    "    gen_config.top_p = top_p\n",
    "    \n",
    "    messages = [\n",
    "        {\"role\": \"system\", \"content\": \"\"},\n",
    "        {\"role\": \"user\", \"content\": prompt}\n",
    "    ]\n",
    "    \n",
    "    text = tokenizer.apply_chat_template(\n",
    "        messages,\n",
    "        tokenize=False,\n",
    "        add_generation_prompt=True\n",
    "    )\n",
    "    print(text)\n",
    "    \n",
    "    model_input = tokenizer([text], return_tensors='pt').to('cuda')\n",
    "    \n",
    "    generated_ids = model.generate(model_input.input_ids,\n",
    "                                  max_new_tokens=max_new_tokens,\n",
    "                                  generation_config=gen_config)\n",
    "    generated_ids = [\n",
    "        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_input.input_ids, generated_ids)\n",
    "    ]\n",
    "    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]\n",
    "    \n",
    "    return response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "0511bc73-0238-4a48-b795-9fd6721db4ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<|im_start|>system\n",
      "<|im_end|>\n",
      "<|im_start|>user\n",
      "hello<|im_end|>\n",
      "<|im_start|>assistant\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'Hello! How can I assist you today?'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = 'hello'\n",
    "run_prompt(prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6d11e9d-0d4e-4e26-85f8-665887244294",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 5. huggingface `transformers` 调用本地开源模型 `chatglm3`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23209fa3-1d84-4cdc-88bc-194fb898860c",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "### ChatGLM全名General Language Model，是智谱AI自主研发的大型语言模型\n",
    "- 开源的产品 chatglm3-6B GLM-9B\n",
    "\n",
    "chatglm3模型在**ModelScope**和**Hugging Face**平台上可以在线体验\n",
    "\n",
    "\n",
    "#### 下载\n",
    "- https://www.modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/files\n",
    "\n",
    "``` shell\n",
    "git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b-32k.git\n",
    "```\n",
    "#### 安装\n",
    "\n",
    "``` shell\n",
    "# transformers==4.41.2\n",
    "pip install transformers\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d0c9a45-4404-4fd6-bcf2-d17e7a46531e",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 调用chatglm3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "59a49b40-2339-4ab9-849e-c14c612bca6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModel, AutoTokenizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "298627f2-8295-4511-a3ff-422aa5d8e49d",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_path = './data/llm_app/llm/chatglm3-6b-32k/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a39bcc1c-8e2c-424e-a4ab-6a6aece4c711",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]/root/anaconda3/envs/llm/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
      "  return self.fget.__get__(instance, owner)()\n",
      "Loading checkpoint shards: 100%|██████████| 7/7 [02:01<00:00, 17.30s/it]\n"
     ]
    }
   ],
   "source": [
    "tokenizer = AutoTokenizer.from_pretrained(model_path,\n",
    "                                         trust_remote_code=True)\n",
    "\n",
    "model = AutoModel.from_pretrained(model_path,\n",
    "                                 device_map=\"auto\",\n",
    "                                 trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "38513e43-350d-424e-a04e-157b8bcd6581",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = model.half().eval()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a1e5a3bf-7b44-4e0e-8632-f6aba215e6b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "def chatglm(prompt, history=[], model=model, tokenizer=tokenizer):\n",
    "    response, history = model.chat(tokenizer, \n",
    "                                   prompt , \n",
    "                                   history=[], \n",
    "                                   temperature=0.1, top_p=0.8, top_k=20,\n",
    "                                   max_length=8192)\n",
    "    return response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2bc1f2cb-373e-455b-95a8-c65cada018a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'你好！很高兴见到你，欢迎问我任何问题。'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chatglm(\"你好\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "25c178bb-7869-4f7c-a342-868ccce0f26a",
   "metadata": {},
   "outputs": [],
   "source": [
    "def chatglm_stream(prompt, history=[], model=model, tokenizer=tokenizer):\n",
    "    for data in model.stream_chat(tokenizer, \n",
    "                                   prompt , \n",
    "                                   history=[], \n",
    "                                   temperature=0.1, top_p=0.8, top_k=20,\n",
    "                                   max_length=8192):\n",
    "        yield data[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "71c8a3a5-0b96-47c0-824b-318bb3ceaf68",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import clear_output\n",
    "user_prompt = \"你好, 请介绍下自己\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ee3eaeeb-860d-4ba2-907f-8961682789ee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "你好！我是 ChatGLM3-6B，是清华大学KEG实验室和智谱AI公司共同训练的语言模型。我的目标是通过回答用户提出的问题来帮助他们解决问题。由于我是一个计算机程序，所以我没有自我意识，也不能像人类一样感知世界。我只能通过分析我所学到的信息来回答问题。\n"
     ]
    }
   ],
   "source": [
    "for res in chatglm_stream(user_prompt):\n",
    "    clear_output(wait=True)\n",
    "    print(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1dafb22d-b3b2-44b7-802d-c97ca7bc306e",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 封装到langchain LLM里\n",
    "LangChain是一个开源框架，它通过提供一系列工具、套件和接口，使开发者能够使用语言模型来实现各种复杂的任务，如文本到图像的生成、文档问答、聊天机器人等。LangChain简化了LLM应用程序生命周期的各个阶段，包括开发、生产化和部署。\n",
    "\n",
    "LangChain具有六大核心组件，这些组件相互协作，形成一个强大而灵活的系统：\n",
    "\n",
    "1. **模型（Models）**：包含各大语言模型的LangChain接口和调用细节，以及输出解析机制。\n",
    "2. **提示模板（Prompts）**：使提示工程流线化，进一步激发大语言模型的潜力。\n",
    "3. **数据检索（Indexes）**：构建并操作文档的方法，接受用户的查询并返回最相关的文档，轻松搭建本地知识库。\n",
    "4. **记忆（Memory）**：通过短时记忆和长时记忆，在对话过程中存储和检索数据，增强ChatGPT等聊天机器人的记忆能力。\n",
    "5. **链（Chains）**：LangChain中的核心机制，以特定方式封装各种功能，并通过一系列的组合，自动而灵活地完成任务。\n",
    "6. **代理（Agents）**：通过“代理”让大模型自主调用外部工具和内部工具，使智能Agent成为可能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "7241313b-b715-4fe0-a84b-0dd6ff0b361d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms.base import LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "f108d685-5efc-405a-8f87-825282be7ee0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel\n",
    "from typing import Any, List, Optional\n",
    "from langchain.callbacks.manager import CallbackManagerForLLMRun\n",
    "class ChatGLM(LLM):\n",
    "    tokenizer : AutoTokenizer = None\n",
    "    model: AutoModelForCausalLM = None\n",
    "\n",
    "    def __init__(self, model, tokenizer):\n",
    "        super().__init__()\n",
    "        self.tokenizer = tokenizer\n",
    "        self.model = model\n",
    "    def _call(self, prompt : str, stop: Optional[List[str]] = None,\n",
    "                run_manager: Optional[CallbackManagerForLLMRun] = None,\n",
    "                **kwargs: Any):\n",
    "        response, history = self.model.chat(self.tokenizer, prompt , history=[], temperature=0.1, top_p=0.8)\n",
    "        return response\n",
    "    @property\n",
    "    def _llm_type(self) -> str:\n",
    "        return \"chatglm\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "6d4d0139-25f4-4c62-898a-e2380ec190f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "chatglm = ChatGLM(model, tokenizer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "410a4225-297f-4fb4-ace9-b1afe2aacd34",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_30461/1897613078.py:1: LangChainDeprecationWarning: The method `BaseLLM.__call__` was deprecated in langchain-core 0.1.7 and will be removed in 1.0. Use invoke instead.\n",
      "  chatglm(\"你好\")\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'你好！很高兴见到你，欢迎问我任何问题。'"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chatglm(\"你好\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c17f294d-311b-42ab-8352-dd0db0db9be5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "0fdebd9d-a83f-4ecf-8a4b-33768f9574d3",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 6.通过Http API在线调用讯飞星火spark大模型"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c416750-f9fc-47fb-9536-65233d8fa915",
   "metadata": {
    "tags": []
   },
   "source": [
    "星火大模型API当前有Lite、Pro、Pro-128K、Max和4.0 Ultra五个版本，各版本独立计量tokens\n",
    "``` shell\n",
    "\n",
    "# python\n",
    "pip install --upgrade spark_ai_python\n",
    "\n",
    "# http\n",
    "pip install requets\n",
    "```\n",
    "- https://www.xfyun.cn/doc/spark/HTTP%E8%B0%83%E7%94%A8%E6%96%87%E6%A1%A3.html\n",
    "\n",
    "- **参数**\n",
    "\n",
    "![](./data/sparkai.png)\n",
    "\n",
    "- **key**\n",
    "\n",
    "**https://console.xfyun.cn/services/bm35https://console.xfyun.cn/services/bm35**\n",
    "\n",
    "![](./data/sparkkey.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2efa87b3-99a0-4db0-9d17-b176dda67b42",
   "metadata": {},
   "source": [
    "### python 调用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "035f974c-3d11-4f63-abe0-cfd206b3a20b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from sparkai.llm.llm import ChatSparkLLM, ChunkPrintHandler\n",
    "from sparkai.core.messages import ChatMessage\n",
    "from config import api_secret, api_key\n",
    "\n",
    "appid = \"5541c544\"     #填写控制台中获取的 APPID 信息\n",
    "\n",
    "domain = \"generalv3\"   # v1.5版本\n",
    "spark_url = \"ws://spark-api.xf-yun.com/v3.1/chat\"  # v2.0环境的地址\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "c422df61-2c47-47eb-9954-8fb9c426e586",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sparkai.log.logger import logger\n",
    "from sparkai.core.callbacks import StdOutCallbackHandler\n",
    "spark = ChatSparkLLM(\n",
    "    spark_api_url=spark_url,\n",
    "    spark_app_id=appid,\n",
    "    spark_api_key=api_key,\n",
    "    spark_api_secret=api_secret,\n",
    "    spark_llm_domain=domain,\n",
    "    streaming=False,\n",
    "    max_tokens= 1024,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "37357f82-c838-4fe9-81f4-2597537ca937",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "您好，我是科大讯飞研发的认知智能大模型，我的名字叫讯飞星火认知大模型。我可以和人类进行自然交流，解答问题，高效完成各领域认知智能需求。\n"
     ]
    }
   ],
   "source": [
    "prompt = '你是谁？'\n",
    "\n",
    "messages = [ChatMessage(\n",
    "        role=\"user\",\n",
    "        content=prompt\n",
    "    )]\n",
    "\n",
    "\n",
    "history_msg = []\n",
    "history = []\n",
    "if len(history) != 0 :\n",
    "    history_msg = [ChatMessage(role=msg[0], content=msg[1]) for msg in history]\n",
    "    messages = history_msg + [ChatMessage(role=\"user\", content=messages[0]['content'])]\n",
    "\n",
    "handler = ChunkPrintHandler()\n",
    "a = spark.generate([messages], callbacks=[handler])\n",
    "print(a.generations[0][0].text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "180ca2bf-9551-4205-8b8d-3f371b1a7c75",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ea3f0dcb-ae4e-406c-b48c-611fb1bfed6a",
   "metadata": {},
   "source": [
    "### spark http"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "643f5449-5044-4777-bdde-94cae5b60c8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import json\n",
    "from IPython.display import clear_output\n",
    "from config import http_key\n",
    "\n",
    "url = \"https://spark-api-open.xf-yun.com/v1/chat/completions\"\n",
    "data = {\n",
    "        \"model\": \"generalv3\", # 指定请求的模型\n",
    "        \"messages\": [\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": \"你是谁\"\n",
    "            }\n",
    "        ],\n",
    "        \"stream\": True\n",
    "    }\n",
    "header = {\"Authorization\": f\"Bearer {http_key}\"}\n",
    "response = requests.post(url, headers=header, json=data, stream=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "b415bd7b-429a-49a9-ae97-0a2f7564e705",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "您好，我是科大讯飞研发的认知智能大模型，我的名字叫讯飞星火认知大模型。我可以和人类进行自然交流，解答问题，高效完成各领域认知智能需求。"
     ]
    }
   ],
   "source": [
    "response.encoding = \"utf-8\"\n",
    "info = \"\"\n",
    "for line in response.iter_lines(decode_unicode=\"utf-8\"):\n",
    "    if 'data' in line:\n",
    "        if 'DONE' in line:\n",
    "            continue\n",
    "        res = line.strip().replace('data: ', '')\n",
    "        data = json.loads(res)\n",
    "        cur_info = data['choices'][0]['delta']['content']\n",
    "        print(cur_info, end='')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e781a9ee-b943-4d33-a3fd-8c55345195bd",
   "metadata": {
    "tags": []
   },
   "source": [
    "### openai http"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "5e9916d4-6f20-40aa-b8d1-7280a11aea78",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "生成的文本：\n",
      "您好，作为一个认知智能模型。我的主要功能是提供信息查询、日程管理、智能推荐等服务。无论您需要解答哪种问题，或者需要完成哪种任务，只要告诉我，我都会尽我所能帮助您。同时，我还在不断学习和进步中，希望能更好地为您服务。"
     ]
    }
   ],
   "source": [
    "from openai import OpenAI\n",
    "client = OpenAI(\n",
    "        api_key=f\"{http_key}\", # APIPassword\n",
    "        base_url = 'https://spark-api-open.xf-yun.com/v1' # 指向讯飞星火的请求地址\n",
    "    )\n",
    "response = client.chat.completions.create(\n",
    "    model='generalv3', # 指定请求的版本\n",
    "    messages = [\n",
    "        {\"role\": \"system\", \"content\": \"你是一个乐于助人的助手。\"},\n",
    "        {\"role\": \"user\", \"content\": \"介绍下你自己\"}\n",
    "    ],\n",
    "    temperature=0.7,  # 控制生成文本的随机性。越低越确定，越高越随机。\n",
    "    top_p=0.9,       # 核采样 (Nucleus sampling)。top_p=0.9表示只考虑概率质量前90%的词。\n",
    "    max_tokens=4096,  # 最大生成的token数量。\n",
    "    stream=True      # 开启流式输出\n",
    ")\n",
    "print(\"生成的文本：\")\n",
    "\n",
    "for chunk in response:\n",
    "    print(chunk.choices[0].delta.content, end='', flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a71fb7ca-0254-4fb0-a966-a2e679ad88d2",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 7.ollama(cpu/gpu)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36a539a9-77a4-494b-a29c-9a4b68bc1110",
   "metadata": {
    "tags": []
   },
   "source": [
    "Ollama是一个集成了多种大型语言模型的工具，它支持模型的部署、运行以及API的整合和调用\n",
    "\n",
    "- 安装Ollama：\n",
    "```shell\n",
    "curl -fsSL https://ollama.com/install.sh | sh\n",
    "```\n",
    "\n",
    "- 验证安装：\n",
    "```shell\n",
    "# 输入来验证安装是否成功。\n",
    "ollama --version\n",
    "```\n",
    "\n",
    "- 使用\n",
    "\n",
    "``` shell\n",
    "# 启动服务\n",
    "ollama serve\n",
    "\n",
    "# 运行模型\n",
    "\n",
    "ollama run qwen2:70b\n",
    "\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "04acd649-f930-43c1-b8ed-4422e87a1107",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "我是来自阿里云的大规模语言模型，我叫通义千问。我是阿里云自主研发的超大规模语言模型，也能够生成与给定词语相关的同义词或短语，帮助用户丰富表达和拓宽思路。如果您有任何问题或需要帮助，请随时告诉我，我会尽力提供支持。"
     ]
    }
   ],
   "source": [
    "from openai import OpenAI\n",
    "\n",
    "client = OpenAI(\n",
    "    base_url='http://localhost:11434/v1/',\n",
    "    api_key='qwen2:72b',\n",
    ")\n",
    "chat_completion = client.chat.completions.create(\n",
    "    messages=[\n",
    "        {\n",
    "            'role': 'user',\n",
    "            'content': '介绍下你自己？',\n",
    "        }\n",
    "    ],\n",
    "    max_tokens=4096,  # 最大生成的token数量。\n",
    "    stream=True,      # 开启流式输出\n",
    "    model='qwen2:72b',\n",
    "    temperature=0.7,  # 控制生成文本的随机性。越低越确定，越高越随机。\n",
    "    top_p=0.9,\n",
    ")\n",
    "for chunk in chat_completion:\n",
    "    print(chunk.choices[0].delta.content, end='', flush=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm",
   "language": "python",
   "name": "llm"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
